自动驾驶的运动预测是一项艰巨的任务,因为复杂的驾驶场景导致静态和动态输入的异质组合。这是一个开放的问题,如何最好地表示和融合有关道路几何,车道连接,时变的交通信号状态以及动态代理的历史及其相互作用的历史。为了模拟这一不同的输入功能集,许多提出的方法旨在设计具有多种模态模块的同样复杂系统。这导致难以按严格的方式进行扩展,扩展或调整的系统以进行质量和效率。在本文中,我们介绍了Wayformer,这是一个基于注意力的运动架构,用于运动预测,简单而均匀。 Wayformer提供了一个紧凑的模型描述,该描述由基于注意力的场景编码器和解码器组成。在场景编码器中,我们研究了输入方式的早期,晚和等级融合的选择。对于每种融合类型,我们通过分解的注意力或潜在的查询关注来探索策略来折衷效率和质量。我们表明,尽管早期融合的结构简单,但不仅是情感不可知论,而且还取得了最先进的结果。
translated by 谷歌翻译
In this paper, we explore the inclusion of latent random variables into the hidden state of a recurrent neural network (RNN) by combining the elements of the variational autoencoder. We argue that through the use of high-level latent random variables, the variational RNN (VRNN) 1 can model the kind of variability observed in highly structured sequential data such as natural speech. We empirically evaluate the proposed model against other related sequential models on four speech datasets and one handwriting dataset. Our results show the important roles that latent random variables can play in the RNN dynamics.
translated by 谷歌翻译
Radiance Fields (RF) are popular to represent casually-captured scenes for new view generation and have been used for applications beyond it. Understanding and manipulating scenes represented as RFs have to naturally follow to facilitate mixed reality on personal spaces. Semantic segmentation of objects in the 3D scene is an important step for that. Prior segmentation efforts using feature distillation show promise but don't scale to complex objects with diverse appearance. We present a framework to interactively segment objects with fine structure. Nearest neighbor feature matching identifies high-confidence regions of the objects using distilled features. Bilateral filtering in a joint spatio-semantic space grows the region to recover accurate segmentation. We show state-of-the-art results of segmenting objects from RFs and compositing them to another scene, changing appearance, etc., moving closer to rich scene manipulation and understanding. Project Page: https://rahul-goel.github.io/isrf/
translated by 谷歌翻译
Text classifiers have promising applications in high-stake tasks such as resume screening and content moderation. These classifiers must be fair and avoid discriminatory decisions by being invariant to perturbations of sensitive attributes such as gender or ethnicity. However, there is a gap between human intuition about these perturbations and the formal similarity specifications capturing them. While existing research has started to address this gap, current methods are based on hardcoded word replacements, resulting in specifications with limited expressivity or ones that fail to fully align with human intuition (e.g., in cases of asymmetric counterfactuals). This work proposes novel methods for bridging this gap by discovering expressive and intuitive individual fairness specifications. We show how to leverage unsupervised style transfer and GPT-3's zero-shot capabilities to automatically generate expressive candidate pairs of semantically similar sentences that differ along sensitive attributes. We then validate the generated pairs via an extensive crowdsourcing study, which confirms that a lot of these pairs align with human intuition about fairness in the context of toxicity classification. Finally, we show how limited amounts of human feedback can be leveraged to learn a similarity specification that can be used to train downstream fairness-aware models.
translated by 谷歌翻译
Stylized view generation of scenes captured casually using a camera has received much attention recently. The geometry and appearance of the scene are typically captured as neural point sets or neural radiance fields in the previous work. An image stylization method is used to stylize the captured appearance by training its network jointly or iteratively with the structure capture network. The state-of-the-art SNeRF method trains the NeRF and stylization network in an alternating manner. These methods have high training time and require joint optimization. In this work, we present StyleTRF, a compact, quick-to-optimize strategy for stylized view generation using TensoRF. The appearance part is fine-tuned using sparse stylized priors of a few views rendered using the TensoRF representation for a few iterations. Our method thus effectively decouples style-adaption from view capture and is much faster than the previous methods. We show state-of-the-art results on several scenes used for this purpose.
translated by 谷歌翻译
Indian e-commerce industry has evolved over the last decade and is expected to grow over the next few years. The focus has now shifted to turnaround time (TAT) due to the emergence of many third-party logistics providers and higher customer expectations. The key consideration for delivery providers is to balance their overall operating costs while meeting the promised TAT to their customers. E-commerce delivery partners operate through a network of facilities whose strategic locations help to run the operations efficiently. In this work, we identify the locations of hubs throughout the country and their corresponding mapping with the distribution centers. The objective is to minimize the total network costs with TAT adherence. We use Genetic Algorithm and leverage business constraints to reduce the solution search space and hence the solution time. The results indicate an improvement of 9.73% in TAT compliance compared with the current scenario.
translated by 谷歌翻译
Active target sensing is the task of discovering and classifying an unknown number of targets in an environment and is critical in search-and-rescue missions. This paper develops a deep reinforcement learning approach to plan informative trajectories that increase the likelihood for an uncrewed aerial vehicle (UAV) to discover missing targets. Our approach efficiently (1) explores the environment to discover new targets, (2) exploits its current belief of the target states and incorporates inaccurate sensor models for high-fidelity classification, and (3) generates dynamically feasible trajectories for an agile UAV by employing a motion primitive library. Extensive simulations on randomly generated environments show that our approach is more efficient in discovering and classifying targets than several other baselines. A unique characteristic of our approach, in contrast to heuristic informative path planning approaches, is that it is robust to varying amounts of deviations of the prior belief from the true target distribution, thereby alleviating the challenge of designing heuristics specific to the application conditions.
translated by 谷歌翻译
Modern virtual assistants use internal semantic parsing engines to convert user utterances to actionable commands. However, prior work has demonstrated that semantic parsing is a difficult multilingual transfer task with low transfer efficiency compared to other tasks. In global markets such as India and Latin America, this is a critical issue as switching between languages is prevalent for bilingual users. In this work we dramatically improve the zero-shot performance of a multilingual and codeswitched semantic parsing system using two stages of multilingual alignment. First, we show that constrastive alignment pretraining improves both English performance and transfer efficiency. We then introduce a constrained optimization approach for hyperparameter-free adversarial alignment during finetuning. Our Doubly Aligned Multilingual Parser (DAMP) improves mBERT transfer performance by 3x, 6x, and 81x on the Spanglish, Hinglish and Multilingual Task Oriented Parsing benchmarks respectively and outperforms XLM-R and mT5-Large using 3.2x fewer parameters.
translated by 谷歌翻译
Over the years, Machine Learning models have been successfully employed on neuroimaging data for accurately predicting brain age. Deviations from the healthy brain aging pattern are associated to the accelerated brain aging and brain abnormalities. Hence, efficient and accurate diagnosis techniques are required for eliciting accurate brain age estimations. Several contributions have been reported in the past for this purpose, resorting to different data-driven modeling methods. Recently, deep neural networks (also referred to as deep learning) have become prevalent in manifold neuroimaging studies, including brain age estimation. In this review, we offer a comprehensive analysis of the literature related to the adoption of deep learning for brain age estimation with neuroimaging data. We detail and analyze different deep learning architectures used for this application, pausing at research works published to date quantitatively exploring their application. We also examine different brain age estimation frameworks, comparatively exposing their advantages and weaknesses. Finally, the review concludes with an outlook towards future directions that should be followed by prospective studies. The ultimate goal of this paper is to establish a common and informed reference for newcomers and experienced researchers willing to approach brain age estimation by using deep learning models
translated by 谷歌翻译
Object-goal navigation (Object-nav) entails searching, recognizing and navigating to a target object. Object-nav has been extensively studied by the Embodied-AI community, but most solutions are often restricted to considering static objects (e.g., television, fridge, etc.). We propose a modular framework for object-nav that is able to efficiently search indoor environments for not just static objects but also movable objects (e.g. fruits, glasses, phones, etc.) that frequently change their positions due to human intervention. Our contextual-bandit agent efficiently explores the environment by showing optimism in the face of uncertainty and learns a model of the likelihood of spotting different objects from each navigable location. The likelihoods are used as rewards in a weighted minimum latency solver to deduce a trajectory for the robot. We evaluate our algorithms in two simulated environments and a real-world setting, to demonstrate high sample efficiency and reliability.
translated by 谷歌翻译